14 research outputs found

    Demystifying GPT Self-Repair for Code Generation

    Full text link
    Large Language Models (LLMs) have shown remarkable aptitude in code generation but still struggle on challenging programming tasks. Self-repair -- in which the model debugs and fixes mistakes in its own code -- has recently become a popular way to boost performance in these settings. However, only very limited studies on how and when self-repair works effectively exist in the literature, and one might wonder to what extent a model is really capable of providing accurate feedback on why the code is wrong when that code was generated by the same model. In this paper, we analyze GPT-3.5 and GPT-4's ability to perform self-repair on APPS, a challenging dataset consisting of diverse coding challenges. To do so, we first establish a new evaluation strategy dubbed pass@t that measures the pass rate of the tasks against the total number of tokens sampled from the model, enabling a fair comparison to purely sampling-based approaches. With this evaluation strategy, we find that the effectiveness of self-repair is only seen in GPT-4. We also observe that self-repair is bottlenecked by the feedback stage; using GPT-4 to give feedback on the programs generated by GPT-3.5 and using expert human programmers to give feedback on the programs generated by GPT-4, we unlock significant performance gains

    CodeExp: Explanatory Code Document Generation

    Full text link
    Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first conducted a human study to identify the criteria for high-quality explanatory docstring for code. Based on that, we collected and refined a large-scale code docstring corpus and formulated automatic evaluation metrics that best match human assessments. Finally, we present a multi-stage fine-tuning strategy and baseline models for the task. Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones. We envision our training dataset, human-evaluation protocol, recommended metrics, and fine-tuning strategy can boost future code explanation research. The code and annotated data are available at https://github.com/subercui/CodeExp.Comment: Accepted in Findings of EMNLP 202

    Fault-Aware Neural Code Rankers

    Full text link
    Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit tests to select one candidate solution. However, these approaches assume that the unit tests are given and assume the ability to safely execute the generated programs (which can do arbitrary dangerous operations such as file manipulations). Both of the above assumptions are impractical in real-world software development. In this paper, we propose CodeRanker, a neural ranker that can predict the correctness of a sampled program without executing it. Our CodeRanker is fault-aware i.e., it is trained to predict different kinds of execution information such as predicting the exact compile/runtime error type (e.g., an IndexError or a TypeError). We show that CodeRanker can significantly increase the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.Comment: In the proceedings of Advances in Neural Information Processing Systems, 202

    The Three Pillars of Machine Programming

    Get PDF
    In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are:(i) intention,(ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software

    Neurosymbolic Learning for Robust and Reliable Intelligent Systems

    No full text
    This thesis shows that looking at intelligent systems through the lens of neurosymbolic models has several benefits over traditional deep learning approaches. Neurosymbolic models contain symbolic programmatic constructs such as loops and conditionals and continuous neural components. The symbolic part makes the model interpretable, generalizable, and robust, while the neural part handles the complexity of the intelligent systems. Concretely, this thesis presents two classes of neurosymbolic models—state-machines and neurosymbolic transformers and evaluates them on two case studies—reinforcement-learning based autonomous systems and multirobot systems. These case studies showed that the learned neurosymbolic models are human-readable, can be extrapolated to unseen scenarios, and can handle robust objectives in the specification. To efficiently learn these neurosymbolic models, we introduce neurosymbolic learning algorithms that leverage the latest techniques from machine learning and program synthesis.Ph.D

    Synthesis of domain specific Clause Normal Form encoders for bit-vector solvers

    No full text
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 61-66).SMT solvers are at the heart of a number of software engineering tools. These SMT solvers use a SAT solver as the back-end and convert the high-level constraints given by the user down to low-level boolean formulas that can be efficiently mapped to CNF clauses and fed into a SAT solver. Current SMT solvers are designed to be general purpose solvers that are suited to a wide range of problems. However, SAT solvers are very non-deterministic and hence, it is difficult to optimize a general purpose solver across all different problems. In this thesis, we propose a system that can automatically generate parts of SMT solvers in a way that is tailored to particular problem domains. In particular, we target the translation from high-level constraints to CNF clauses which is one of the crucial parts of all SMT solvers. We achieve this goal by using a combination of program synthesis and machine learning techniques. We use a program synthesis tool called Sketch to generate optimal encoding rules for this translation and then use auto-tuning to only select the subset of these encodings that actually improve the performance for a particular class of problems. Using this technique, the thesis shows that we can improve upon the basic encoding strategy used by CVC4 (a state of the art SMT solver). We can automatically generate variants of the solver tailored to different domains of problems represented in the bit-vector benchmark suite from the SMT competition 2015.by Jeevana Priya Inala.M. Eng
    corecore